library(readxl) #read in xlsx file
library(tidyverse) #data cleaning and manipulation
library(psych) #for EFA tests
library(lavaan) #to conduct EFA
library(lavaanPlot) #to plot EFA objects
library(psych) #to conduct EFAEFA for EPSY 905
Load R packages
Read in the data
data <- read_xlsx("saq_data.xlsx")
df <- data.frame(data)Any missing data? -- NO! :)
table(is.na(df))
FALSE
59133
Assumptions of EFA:
- Relationship between variables and latent factors are linear
- Some underlying latent structure exists
- Linear coefficients are same across participants
- Linear relationship between the variables and underlying normal distribution
- Variables are ideally continuous, and if they are continuous, should have multivariate normality
Items 1-3 are conceptual assumptions of EFA. We will test assumptions 4 & 5 below
Assumption 4: Linearity and underlying normal distribution
Histograms of variables – From these, it appears that most variables are not normally distributed, but this not surprising considering that we are working with ordinal data.
library(patchwork) #package that puts multiple figures side by side
create_histogram <- function(data, col) {
ggplot(data, aes(x = .data[[col]])) +
geom_histogram(binwidth = 1, fill = "cornflowerblue", color = "black") +
labs(title = paste("Histogram of", col),
x = col, y = "Frequency") +
theme(aspect.ratio = 1/5)
}
histograms_ggplot <- lapply(colnames(df), function(col) {
create_histogram(df, col)
})
combined_histograms <- wrap_plots(histograms_ggplot, ncol = 4)
combined_histogramsBoxplot of variables – another way to show distributions of the variables
boxplot(df, notch = TRUE, boxfill= "cornflowerblue", whiskcol = "firebrick", pch=16, outcol = "firebrick")Pearson correlations between variables; it appears that we mostly have positive correlations between variables
library(gpairs) #a package to visualize correlations between variables
suppressWarnings(corrgram(df))Looking at univariate skewness and kurtosis
According to Curran et al. (1996), univariate skew greater than 2.0 or kurtosis greater than 7.0 would indicate “severe univariate nonnormality”. We did not have any concerning univariate skew or kurtosis values. This is reassuring, however, these tests are intended for continuous variables, so this finding should be interpreted with caution.
describe(df) vars n mean sd median trimmed mad min max range skew kurtosis se
q01 1 2571 2.37 0.83 2 2.34 0.00 1 5 4 0.65 0.61 0.02
q02 2 2571 1.62 0.85 1 1.46 0.00 1 5 4 1.49 2.04 0.02
q03 3 2571 2.59 1.08 3 2.57 1.48 1 5 4 0.09 -0.78 0.02
q04 4 2571 2.79 0.95 3 2.74 1.48 1 5 4 0.39 -0.29 0.02
q05 5 2571 2.72 0.96 3 2.68 1.48 1 5 4 0.46 -0.44 0.02
q06 6 2571 2.23 1.12 2 2.09 1.48 1 5 4 0.93 0.15 0.02
q07 7 2571 2.92 1.10 3 2.89 1.48 1 5 4 0.20 -0.85 0.02
q08 8 2571 2.24 0.87 2 2.15 0.00 1 5 4 1.05 1.48 0.02
q09 9 2571 2.85 1.26 3 2.83 1.48 1 5 4 -0.06 -1.15 0.02
q10 10 2571 2.28 0.88 2 2.21 0.00 1 5 4 0.83 0.59 0.02
q11 11 2571 2.26 0.88 2 2.18 0.00 1 5 4 0.84 0.90 0.02
q12 12 2571 3.16 0.92 3 3.12 1.48 1 5 4 0.18 -0.21 0.02
q13 13 2571 2.45 0.95 2 2.40 1.48 1 5 4 0.61 0.04 0.02
q14 14 2571 2.88 1.00 3 2.84 1.48 1 5 4 0.29 -0.36 0.02
q15 15 2571 2.77 1.01 3 2.72 1.48 1 5 4 0.43 -0.45 0.02
q16 16 2571 2.88 0.92 3 2.82 1.48 1 5 4 0.40 -0.13 0.02
q17 17 2571 2.47 0.88 2 2.40 0.00 1 5 4 0.76 0.49 0.02
q18 18 2571 2.57 1.05 2 2.51 1.48 1 5 4 0.48 -0.23 0.02
q19 19 2571 2.29 1.10 2 2.21 1.48 1 5 4 0.48 -0.72 0.02
q20 20 2571 3.62 1.04 4 3.68 1.48 1 5 4 -0.39 -0.65 0.02
q21 21 2571 3.17 0.98 3 3.12 1.48 1 5 4 0.12 -0.78 0.02
q22 22 2571 2.89 1.04 3 2.93 1.48 1 5 4 -0.07 -0.68 0.02
q23 23 2571 3.43 1.04 4 3.49 1.48 1 5 4 -0.59 -0.16 0.02
Summary of Assumption #4 (Linear relationship between the variables and underlying normal distribution) checking: Most of the variables are not normally distributed. However, this is typical for ordinal data. No concerns about univariate skewness or kurtosis (but this was done using tests meant for continuous variables, so these findings should be interpreted with caution).
Assumption 5: Continuous Variables & Multivariate Normality
Mardia’s multivariate test for skewness and kurtosis.
Our data is positively skewed and leptokurtic (pointyness)
- Multivariate skew = 20.74 (p < 0.001)
- Multivariate kurtosis = 685.61 (p < 0.001)
- According to Bentler (2005), multivariate kurtosis > 3.0 to 5.0 might bias factor analysis results
library(QuantPsyc) #package to test for Mardia's multivariate skew and kurtosis
mardia_results <- mult.norm(df)
mardia_results$mult.test Beta-hat kappa p-val
Skewness 20.74316 8888.44605 0
Kurtosis 685.61377 82.69539 0
Plotting Mardia’s skewness
Call: mardia(x = df, na.rm = TRUE, plot = TRUE)
Mardia tests of multivariate skew and kurtosis
Use describe(x) the to get univariate tests
n.obs = 2571 num.vars = 23
b1p = 20.74 skew = 8888.45 with probability <= 0
small sample skew = 8899.68 with probability <= 0
b2p = 685.61 kurtosis = 82.7 with probability <= 0
Robust Mahalanobis Distance
Robust MD is more appropriate for ordinal data because of its robustness to outliers and lack of assumptions about the underlying distribution (i.e., better suited for non-normally distributed data with potential outliers)
Results (see below) of RobustMD show 10 outliers, which we will remove before starting EFA
library(faoutlier) #for robustMD() function
out = robustMD(df, method = "mcd")
print(out) mah p sig
1117 178.2167 0 ****
84 160.6360 0 ****
2051 130.5956 0 ****
285 129.8526 0 ****
1278 119.9712 0 ****
678 117.8172 0 ****
808 116.4562 0 ****
1285 106.9913 0 ****
680 106.0960 0 ****
346 101.0658 0 ****
Plotting Robust MD
plot(out, type = 'qqplot')Removing outliers (based on Robust MD Results)
#rows with outliers
outlier_indices <- c(1117, 84, 2051, 285, 1278, 678, 808, 1285, 680, 346)
#remove outliers
df_clean <- df %>%
filter(!row_number() %in% outlier_indices)Summary of Assumption #5 (Variables are ideally continuous, and if they are continuous, should have multivariate normality) checking: Both Mardia’s skewness/kurtosis tests and the Mahalanobis Distance test are intended for continuous variables. Since our data is ordinal, Mardia’s skewness/kurtosis results make sense because thety suggest that our data are not multivariate normal. As a result, we should proceed with a polychoric matrix so not to bias EFA results. However, to accommodate our ordinal data, we used Robust Mahalanobis Distance (instead of Mahalanobis Distance, which is intended for continuous variables) to determine multivariate outliers in our data. We removed 10 multivariate outliers in our data.
TO NOTE: It is often not appropriate to pretend that categorical variables are continuous (Flora et al., 2012). Since we are working with ordinal/categorical data, we will proceed with a polychoric matrix.Polychoric correlations assume that a normally distributed continuous, but unobservable, latent variable underlies the observed ordinal variable. Polychoric correlations are a maximum likelihood estimate of the Pearson correlations for those underlying normally distributed continuous variables (Basto & Pereira, 2012).
- Pearson correlations assume normality, which requires continuous scores. Thus, categorical scores are not, by definition, normally distributed (Bandalos, 2018; Hancock & Liu, 2012; Puth et al., 2015; Walsh, 1996).
Converting to Polychoric Matrix
output_poly <- polychoric(df_clean)
#extract polychoric matrix from output object
poly = output_poly$rho
#preview of polychoric matrix
head(poly) q01 q02 q03 q04 q05 q06
q01 1.0000000 -0.1332690 -0.3795848 0.4880221 0.4612646 0.2437908
q02 -0.1332690 1.0000000 0.4085532 -0.1543015 -0.1516755 -0.1025360
q03 -0.3795848 0.4085532 1.0000000 -0.4215101 -0.3370588 -0.2457773
q04 0.4880221 -0.1543015 -0.4215101 1.0000000 0.4603872 0.3096884
q05 0.4612646 -0.1516755 -0.3370588 0.4603872 1.0000000 0.2948085
q06 0.2437908 -0.1025360 -0.2457773 0.3096884 0.2948085 1.0000000
q07 q08 q09 q10 q11 q12
q01 0.3406471 0.38772259 -0.1003074 0.2423981 0.4168383 0.3900703
q02 -0.2033396 -0.05290357 0.3955990 -0.1031174 -0.1933953 -0.2660150
q03 -0.4191982 -0.28861557 0.3247646 -0.2040251 -0.3971147 -0.4562483
q04 0.4481132 0.39817350 -0.1389703 0.2356322 0.4142662 0.4898198
q05 0.3737677 0.31855830 -0.1021172 0.2893173 0.3451599 0.3934881
q06 0.5840649 0.27911043 -0.1287985 0.3849408 0.3952384 0.3527046
q13 q14 q15 q16 q17 q18
q01 0.3953091 0.3911645 0.2684441 0.5504146 0.4258216 0.4010071
q02 -0.2032315 -0.2202330 -0.2188470 -0.2254519 -0.1232949 -0.2178279
q03 -0.3554002 -0.4133536 -0.3408011 -0.4672952 -0.3643531 -0.4170851
q04 0.3743790 0.3936631 0.3674473 0.4566244 0.4229458 0.4249714
q05 0.3368028 0.3581356 0.2835309 0.4438527 0.3593412 0.3663981
q06 0.5105541 0.4584889 0.4067497 0.2761093 0.3277384 0.5667686
q19 q20 q21 q22 q23
q01 -0.2172947 0.2459422 0.3673493 -0.1095262 -0.006225213
q02 0.2713114 -0.2674744 -0.2672002 0.2920950 0.111845276
q03 0.3890594 -0.3667907 -0.4618645 0.2250893 0.153367423
q04 -0.2155679 0.2795362 0.4490687 -0.1089267 -0.028711775
q05 -0.1944819 0.2322192 0.3828512 -0.1409451 -0.047309978
q06 -0.1968400 0.1122415 0.3094600 -0.1912809 -0.087463934
Is EFA Appropriate?
Ideally, we will have many correlations above 0.3, which is appropriate for EFA. The correlation matrix can be visually scanned to ensure that there are several coefficients ≥ .30 (Hair et al., 2019; Tabachnick & Fidell, 2019).
Polychoric Correlation Graph (orange = pos. correlation, green = negative correlation, lines increase in thickness as correlation strength increases)
library(qgraph) #package to create correlation graph
qgraph(poly, cut = 0.3, details = TRUE, posCol = "orangered", negCol = "green", labels=names(df_clean))Another Correlation Plot
corPlot(poly, diag = F, zlim = c(.3, 1), upper = F, numbers = TRUE, cex.axis = 0.5)The figures show that most correlations are above 0.3, which is reassuring, since that’s ideal for EFA. :)
KMO: Kaiser-Meyer-Olkin Factor Adequacy
- KMO is used to measure sampling adequacy (Kaiser, 1974) and to ensure that there is enough shared variance. In other words, the KMO indicates whether variables are likely to be factorizable (values closer to 1 means that the variables are more suitable for FA).
KMO(poly)Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = poly)
Overall MSA = 0.93
MSA for each item =
q01 q02 q03 q04 q05 q06 q07 q08 q09 q10 q11 q12 q13 q14 q15 q16
0.93 0.88 0.95 0.95 0.96 0.88 0.93 0.88 0.84 0.95 0.90 0.95 0.95 0.97 0.94 0.93
q17 q18 q19 q20 q21 q22 q23
0.93 0.95 0.94 0.89 0.93 0.88 0.75
Most of the KMO values are above 0.85, which is ideal! :)
Bartlett’s Test of Sphericity
- Bartlett’s Test of Sphericity is used to ensure that the correlations between variables are significantly different from zero.
cortest.bartlett(poly, n = 2561)$chisq
[1] 24379.88
$p.value
[1] 0
$df
[1] 253
Bartlett’s Test of Sphericity results indicate that we reject the hypothesis that the polychoric correlation matrix is an identiy matrix (χ² = 24,379.88, p < 0.001).
Determining the Number of Factors
Parallel Analysis
fa.parallel(poly, n.obs = 2561, fa = "pc", n.iter = 500) #n.iter = 500 means 500 simulated analyses to performParallel analysis suggests that the number of factors = NA and the number of components = 4
Parallel Analysis suggests 4 factors
MAP: Minimal Average Partials
- MAP is calculated based on the partial correlations between observed variables and the remaining variables in the dataset after controlling for the latent factors. The minimal average of these partials across all variables is then computed.
- A higher MAP value indicates better model fit, suggesting that the latent factors explain a larger proportion of the variance in the observed variables. Conversely, a lower MAP value indicates poorer model fit, suggesting that the latent factors do not adequately capture the relationships among the observed variables.
VSS(poly, rotate = "promax", fm = "pc", plot = TRUE, n.obs = 2561)
Very Simple Structure
Call: vss(x = x, n = n, rotate = rotate, diagonal = diagonal, fm = fm,
n.obs = n.obs, plot = plot, title = title, use = use, cor = cor)
VSS complexity 1 achieves a maximimum of 0.83 with 1 factors
VSS complexity 2 achieves a maximimum of 0.88 with 2 factors
The Velicer MAP achieves a minimum of 0.01 with 4 factors
Warning in min(x$vss.stats[, "BIC"], na.rm = TRUE): no non-missing arguments to
min; returning Inf
BIC achieves a minimum of Inf with factors
Warning in min(x$vss.stats[, "SABIC"], na.rm = TRUE): no non-missing arguments
to min; returning Inf
Sample Size adjusted BIC achieves a minimum of Inf with factors
Statistics by number of factors
vss1 vss2 map dof chisq prob sqresid fit RMSEA BIC SABIC complex eChisq
1 0.83 0.00 0.015 0 NA NA 13.4 0.83 NA NA NA NA NA
2 0.80 0.88 0.015 0 NA NA 9.9 0.88 NA NA NA NA NA
3 0.73 0.86 0.016 0 NA NA 8.0 0.90 NA NA NA NA NA
4 0.76 0.86 0.015 0 NA NA 6.5 0.92 NA NA NA NA NA
5 0.77 0.86 0.017 0 NA NA 5.5 0.93 NA NA NA NA NA
6 0.65 0.84 0.021 0 NA NA 4.8 0.94 NA NA NA NA NA
7 0.76 0.86 0.026 0 NA NA 4.2 0.95 NA NA NA NA NA
8 0.78 0.86 0.032 0 NA NA 3.6 0.95 NA NA NA NA NA
SRMR eCRMS eBIC
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 NA NA NA
5 NA NA NA
6 NA NA NA
7 NA NA NA
8 NA NA NA
# note: promax is a type of oblique rotation, which allows the factors to be correlatedThe Velicer MAP results achieved a minimum of 0.01 with 4 factors
Scree Plot
scree(poly, pc = TRUE, factors = FALSE, hline = "-1")Scree Plot suggest 2-3 factors
Network Plot
library(EGAnet)[1;m[4;m
EGAnet (version 2.0.5)[0m[0m
For help getting started, see <https://r-ega.net>
For bugs and errors, submit an issue to <https://github.com/hfgolino/EGAnet/issues>
EGA(poly, n = 2561, ploy.EGA = TRUE, steps = 10)Model: GLASSO (EBIC with gamma = 0.5)
Correlations: auto
Lambda: 0.0708977369847798 (n = 100, ratio = 0.1)
Number of nodes: 23
Number of edges: 131
Edge density: 0.518
Non-zero edge weights:
M SD Min Max
0.056 0.087 -0.115 0.401
----
Algorithm: Walktrap
Number of communities: 4
q01 q02 q03 q04 q05 q06 q07 q08 q09 q10 q11 q12 q13 q14 q15 q16 q17 q18 q19 q20
1 2 3 1 1 1 1 4 2 1 4 1 1 1 1 1 4 1 3 3
q21 q22 q23
1 2 2
----
Unidimensional Method: Louvain
Unidimensional: No
----
TEFI: -18.853
Network graph suggests 4 factors
How many factors?
Parallel Analysis : 4MAP: 4Scree Plot: 2-4Network Plot: 4We will fit 3 EFA models with 2, 3, and 4 factors.
EFA MODELS BELOW
We will use oblimin rotation, which is an oblique rotation method in factor analysis that allows for correlated factors, preserving the original correlations among factors. It estimates inter-factor correlations, providing insights into the underlying structure of the data, particularly useful when factors are expected to be related.
EFA with 2 factors
f2 = fa(poly, nfactors = 2, rotate = "oblimin", residuals = T, SMC = T, fm = "pa", n.obs = 2561)Loading required namespace: GPArotation
print(f2, sort = TRUE, cut = 0, digits = 2)Factor Analysis using method = pa
Call: fa(r = poly, nfactors = 2, n.obs = 2561, rotate = "oblimin",
residuals = T, SMC = T, fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
item PA1 PA2 h2 u2 com
q17 17 0.79 0.21 0.568 0.43 1.1
q11 11 0.76 0.12 0.533 0.47 1.1
q08 8 0.74 0.30 0.497 0.50 1.3
q18 18 0.65 -0.15 0.511 0.49 1.1
q13 13 0.65 -0.10 0.471 0.53 1.0
q07 7 0.64 -0.12 0.480 0.52 1.1
q04 4 0.63 -0.03 0.412 0.59 1.0
q16 16 0.62 -0.15 0.469 0.53 1.1
q01 1 0.61 0.02 0.370 0.63 1.0
q14 14 0.61 -0.14 0.448 0.55 1.1
q12 12 0.58 -0.23 0.468 0.53 1.3
q21 21 0.57 -0.22 0.446 0.55 1.3
q06 6 0.56 -0.04 0.328 0.67 1.0
q05 5 0.55 -0.04 0.317 0.68 1.0
q15 15 0.54 -0.12 0.347 0.65 1.1
q03 3 -0.44 0.42 0.481 0.52 2.0
q10 10 0.42 -0.05 0.190 0.81 1.0
q20 20 0.31 -0.27 0.224 0.78 2.0
q09 9 0.01 0.58 0.336 0.66 1.0
q02 2 -0.07 0.57 0.354 0.65 1.0
q19 19 -0.24 0.40 0.282 0.72 1.6
q22 22 -0.11 0.38 0.183 0.82 1.2
q23 23 -0.02 0.23 0.056 0.94 1.0
PA1 PA2
SS loadings 6.90 1.87
Proportion Var 0.30 0.08
Cumulative Var 0.30 0.38
Proportion Explained 0.79 0.21
Cumulative Proportion 0.79 1.00
With factor correlations of
PA1 PA2
PA1 1.00 -0.31
PA2 -0.31 1.00
Mean item complexity = 1.2
Test of the hypothesis that 2 factors are sufficient.
df null model = 253 with the objective function = 9.56 with Chi Square = 24379.88
df of the model are 208 and the objective function was 1.83
The root mean square of the residuals (RMSR) is 0.05
The df corrected root mean square of the residuals is 0.06
The harmonic n.obs is 2561 with the empirical chi square 3912.88 with prob < 0
The total n.obs was 2561 with Likelihood Chi Square = 4668.31 with prob < 0
Tucker Lewis Index of factoring reliability = 0.775
RMSEA index = 0.092 and the 90 % confidence intervals are 0.089 0.094
BIC = 3035.9
Fit based upon off diagonal values = 0.97
Measures of factor score adequacy
PA1 PA2
Correlation of (regression) scores with factors 0.96 0.85
Multiple R square of scores with factors 0.92 0.72
Minimum correlation of possible factor scores 0.85 0.45
EFA with 2 factors PLOT
fa.diagram(f2, cut = 0, sort = TRUE)Most variables seem to load onto factor 1
EFA with 3 factors
f3 = fa(poly, nfactors = 3, rotate = "oblimin", residuals = T, SMC = T, fm = "pa", n.obs = 2561)
print(f3, sort = TRUE, cut = 0, digits = 2)Factor Analysis using method = pa
Call: fa(r = poly, nfactors = 3, n.obs = 2561, rotate = "oblimin",
residuals = T, SMC = T, fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
item PA1 PA3 PA2 h2 u2 com
q08 8 0.85 -0.05 0.11 0.614 0.39 1.0
q17 17 0.76 0.07 0.02 0.623 0.38 1.0
q11 11 0.68 0.09 -0.06 0.581 0.42 1.0
q01 1 0.49 0.09 -0.15 0.390 0.61 1.2
q04 4 0.43 0.17 -0.18 0.417 0.58 1.6
q16 16 0.39 0.15 -0.31 0.483 0.52 2.2
q05 5 0.34 0.19 -0.15 0.316 0.68 2.0
q06 6 -0.10 0.90 0.14 0.627 0.37 1.1
q18 18 0.08 0.66 -0.12 0.597 0.40 1.1
q07 7 0.12 0.60 -0.11 0.543 0.46 1.2
q13 13 0.16 0.56 -0.09 0.520 0.48 1.2
q14 14 0.16 0.49 -0.16 0.474 0.53 1.4
q10 10 0.07 0.42 -0.02 0.226 0.77 1.1
q12 12 0.18 0.36 -0.30 0.470 0.53 2.5
q15 15 0.21 0.34 -0.17 0.349 0.65 2.2
q02 2 0.11 0.03 0.66 0.372 0.63 1.1
q09 9 0.23 -0.04 0.60 0.323 0.68 1.3
q03 3 -0.22 -0.03 0.60 0.529 0.47 1.3
q19 19 -0.02 -0.09 0.48 0.284 0.72 1.1
q20 20 0.24 -0.09 -0.44 0.282 0.72 1.6
q22 22 0.11 -0.12 0.39 0.174 0.83 1.4
q21 21 0.31 0.17 -0.36 0.455 0.54 2.4
q23 23 0.07 -0.02 0.24 0.052 0.95 1.2
PA1 PA3 PA2
SS loadings 3.51 3.42 2.77
Proportion Var 0.15 0.15 0.12
Cumulative Var 0.15 0.30 0.42
Proportion Explained 0.36 0.35 0.29
Cumulative Proportion 0.36 0.71 1.00
With factor correlations of
PA1 PA3 PA2
PA1 1.00 0.57 -0.39
PA3 0.57 1.00 -0.47
PA2 -0.39 -0.47 1.00
Mean item complexity = 1.4
Test of the hypothesis that 3 factors are sufficient.
df null model = 253 with the objective function = 9.56 with Chi Square = 24379.88
df of the model are 187 and the objective function was 1.2
The root mean square of the residuals (RMSR) is 0.04
The df corrected root mean square of the residuals is 0.05
The harmonic n.obs is 2561 with the empirical chi square 2346.04 with prob < 0
The total n.obs was 2561 with Likelihood Chi Square = 3062.88 with prob < 0
Tucker Lewis Index of factoring reliability = 0.839
RMSEA index = 0.077 and the 90 % confidence intervals are 0.075 0.08
BIC = 1595.28
Fit based upon off diagonal values = 0.98
Measures of factor score adequacy
PA1 PA3 PA2
Correlation of (regression) scores with factors 0.93 0.93 0.89
Multiple R square of scores with factors 0.87 0.87 0.80
Minimum correlation of possible factor scores 0.74 0.74 0.60
EFA with 3 factors PLOT
fa.diagram(f3, cut = 0, sort = TRUE)EFA with 4 factors
f4 = fa(poly, nfactors = 4, rotate = "oblimin", residuals = T, SMC = T, fm = "pa", n.obs = 2561)
print(f4, sort = TRUE, cut = 0, digits = 2)Factor Analysis using method = pa
Call: fa(r = poly, nfactors = 4, n.obs = 2561, rotate = "oblimin",
residuals = T, SMC = T, fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
item PA1 PA3 PA4 PA2 h2 u2 com
q21 21 0.60 0.08 0.05 -0.10 0.51 0.49 1.1
q04 4 0.54 0.09 0.17 0.05 0.46 0.54 1.3
q01 1 0.52 0.02 0.23 0.07 0.43 0.57 1.4
q12 12 0.51 0.27 -0.02 -0.09 0.51 0.49 1.6
q16 16 0.51 0.08 0.17 -0.09 0.50 0.50 1.4
q20 20 0.49 -0.15 0.04 -0.20 0.31 0.69 1.6
q05 5 0.44 0.12 0.13 0.04 0.34 0.66 1.3
q03 3 -0.43 0.03 -0.10 0.39 0.52 0.48 2.1
q06 6 -0.11 0.87 0.02 0.00 0.69 0.31 1.0
q18 18 0.31 0.56 0.00 -0.04 0.60 0.40 1.6
q07 7 0.29 0.51 0.04 -0.03 0.54 0.46 1.6
q13 13 0.18 0.49 0.15 -0.07 0.52 0.48 1.5
q14 14 0.36 0.40 0.03 -0.04 0.48 0.52 2.0
q10 10 0.01 0.39 0.12 -0.06 0.24 0.76 1.2
q15 15 0.15 0.29 0.20 -0.14 0.35 0.65 2.9
q08 8 -0.01 -0.06 0.90 0.07 0.73 0.27 1.0
q11 11 -0.03 0.07 0.79 -0.12 0.71 0.29 1.1
q17 17 0.11 0.06 0.72 0.02 0.65 0.35 1.1
q09 9 0.03 -0.02 0.07 0.63 0.38 0.62 1.0
q02 2 -0.15 0.06 0.04 0.58 0.38 0.62 1.2
q22 22 0.18 -0.13 -0.10 0.51 0.29 0.71 1.5
q19 19 -0.19 -0.05 -0.03 0.39 0.28 0.72 1.5
q23 23 0.23 -0.04 -0.13 0.37 0.14 0.86 2.0
PA1 PA3 PA4 PA2
SS loadings 3.38 2.71 2.67 1.81
Proportion Var 0.15 0.12 0.12 0.08
Cumulative Var 0.15 0.26 0.38 0.46
Proportion Explained 0.32 0.26 0.25 0.17
Cumulative Proportion 0.32 0.58 0.83 1.00
With factor correlations of
PA1 PA3 PA4 PA2
PA1 1.00 0.48 0.52 -0.39
PA3 0.48 1.00 0.48 -0.31
PA4 0.52 0.48 1.00 -0.21
PA2 -0.39 -0.31 -0.21 1.00
Mean item complexity = 1.5
Test of the hypothesis that 4 factors are sufficient.
df null model = 253 with the objective function = 9.56 with Chi Square = 24379.88
df of the model are 167 and the objective function was 0.7
The root mean square of the residuals (RMSR) is 0.03
The df corrected root mean square of the residuals is 0.04
The harmonic n.obs is 2561 with the empirical chi square 1094.96 with prob < 3.8e-136
The total n.obs was 2561 with Likelihood Chi Square = 1777.53 with prob < 5e-267
Tucker Lewis Index of factoring reliability = 0.899
RMSEA index = 0.061 and the 90 % confidence intervals are 0.059 0.064
BIC = 466.89
Fit based upon off diagonal values = 0.99
Measures of factor score adequacy
PA1 PA3 PA4 PA2
Correlation of (regression) scores with factors 0.92 0.92 0.94 0.85
Multiple R square of scores with factors 0.84 0.85 0.88 0.73
Minimum correlation of possible factor scores 0.68 0.69 0.77 0.45
The root mean square of the residuals (RMSR) for model with 4 factors: 0.03
EFA with 4 factors PLOT
fa.diagram(f4, cut = 0, sort = TRUE)Summary of RMSR values:
- 2 factor model: 0.05
- 3 factor model: 0.04
- 4 factor mdoel: 0.03
The smaller the RMSR (Root Mean Squared Residual) value, the better. RMSR is a measure of overall residual misfit. RSMR values less than .08 are preferred (Brown, 2013). RMSR results suggest that the last model, with 4 factors is most suitable to fit the data.